Querying Semi-Structured Data
نویسنده
چکیده
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in di erent forms, ranging from unstructured data in le systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-speci c interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of dataformats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semistructured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data. The \lightweight" data models they use (based on labelled graphs) are very similar. As we shall see, the topic of semi-structured data has no precise boundary. Furthermore, a theory of semi-structured data is still missing. We will try to highlight some important issues in this context. The paper is organized as follows. In Section 2, we discuss the particularities of semi-structured data. In Section 3, we consider the issue of the data structure and in Section 4, the issue of the query language.
منابع مشابه
Querying Web Forms and Nested Semi-structured Data
Semi-structured data are commonly represented by labelled graphs. These graphs can be simple or nested. In this paper we present how to model nested semi-structured data in the presence of Web forms. Our motivation is to bring our data model more realistic to capture the richness of Web data. The main purpose of the paper is to provide a mechanism to query nested semi-structured data and web fo...
متن کاملQuerying Trees with Pointers
We introduce a data model for semi-structured data and explore a spatial logic for reasoning about this model. This note is part of an on-going project to develop a pattern-matching language for analysing and manipulating semi-structured data. This work was first reported in Appsem 2001 and the session on spatial logic at MFPS 2002.
متن کاملReQueSS: Relational Querying of Semi-Structured Data
We present a prototype of a Web querying interface which is capable of searching and querying unified Web sources of data that have sufficient hidden relational structure. The system converts query-related parts of Web pages into relational data and provides for SQL-like or QBE-like querying capability. The relational query is parsed for relevant information such as selection conditions and tab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997